Technote 1141Extending and Controlling SherlockBy John MontbriandApple Worldwide Developer Technical Support |
CONTENTS
Overview
AppleScript Support
The Optional kAEOpenDocuments Apple Event Parameter Change History Acknowledgments |
Mac OS 8.5 includes several enhanced searching capabilities, known collectively as Sherlock. Previously, the Mac OS Find application allowed users to search mounted disk volumes for files based on information such as name, modification date, and file type. Sherlock retains this functionality, but also extends the user’s search options to include both the content of files and the Internet. Sherlock 2 adds a number of new features to the array of search options presented to the user. To accommodate those new features, some additions have been made to the Internet Search Plug-in language, new applescript commands have been added, and an additional routine has been added to the FindByContent suite. Where appropriate, these new features are described in this document. Find by Content library information formerly found in this note has been moved to Technote TN1180, “Sherlock’s Find By Content Library.” |
OverviewTo perform an Internet search, the Sherlock application sends query information to one or more Internet search sites. The information returned by the search sites is interpreted by the Sherlock application and then displayed for perusal. As each Internet search site has its own particular format for query and response information, the Sherlock application uses plug-ins that describe data formats expected and provided by individual Internet search sites for formatting queries and parsing response data. Internet search site providers interested in building their own Internet search site plug-ins will find directions for doing so in the Internet Search Plug-ins section. AppleScript commands for accessing the new content-based search and Internet search facilities provided by the Sherlock application are available. These include commands for searching by content, a command for indexing volumes, and commands for performing Internet searches. These commands are discussed in greater detail in the AppleScript Support section. The Sherlock application, when asked to open a file that
was found by way of a content-oriented search, attaches
information about the search and why the file was selected
to the 'odoc' Apple event it passes to the Finder. The
Finder passes this information along to applications as a
property associated with the Find By Content is a new system-level facility implemented as a Code Fragment Manager library. The Sherlock application is a client of Find By Content and utilizes its search facilities for performing content-based searches. Developers interested in using the Find By Content services from within their applications may do so by linking against the PowerPC Code Fragment Manager library named “Find By Content” (without the quotes). Routine descriptions and examples are provided in the Find By Content section below. Internet Search Plug-insThe “Search Internet” feature in the Sherlock application allows users to perform Internet searches using one or more Internet search engines. The Sherlock application itself contains no information about the exact data formats expected or generated by individual Internet search engines; when accessing any particular Internet search site, the Sherlock application uses a search plug-in file that describes the data formats both expected by the site for queries and produced by the site in its responses to queries. Internet Search Interface Language (ISIL) is the language used in search plug-in files so that Internet search site administrators may provide their own search plug-in files. ASCII text describing the search site is contained in a
search plug-in’s data fork. The resource fork may be used
for custom icons, Finder strings, et cetera. Search plug-in
files have the creator code ISIL is modeled closely after the HTML it is used to describe, so HTML authors familiar with the syntax should have little or no trouble creating their own search plug-in files. An exact specification of the language can be found in the Internet Search Interface Language BNF section, and the sections that follow discuss the language in greater detail. To create a search plug-in file, you will need a text editor program—Simple Text will do—and a utility that will allow you to change the plug-in’s file type. The basic steps for editing a search plug-in file are:
If your text editor edits any file regardless of type and does not change the types of the files it edits, you can skip steps 3 and 5. The Sherlock application scans the “Internet Search Sites” only once when it is starting up. You should restart the Sherlock application each time you would like to test your search site file. |
<SEARCH name = "<search engine name>" method = ["get" | "post"] action = "<url to address>" [update = "<url containing update file>"] [updateCheckDays = "<days between update pings>"] [description = "<human-readable-description">] [bannerImage = "<url containing banner image>"] [bannerLink = "<url to load when banner clicked>"]> .... <INPUT name = "<input name>" value = "<value>" [mode = "results"]><INPUT name = "<input name>" value = "<value>" [mode = "browser"] > .... <INPUT name = "<input name>" user> .... <INTERPRET [bannerStart = "<text>"] [bannerEnd = "<text>"] [relevanceStart = "<text>"] [relevanceEnd = "<text>"] [resultListStart = "<text>"] [resultListEnd = "<text>"] [resultItemStart = "<text>"] [resultItemEnd = "<text>"] [skipLocal=true] [charset = "<text>"] [resultEncoding = <integer>] [resultTranslationEncoding = <integer>] [resultTranslationFont = "<text>"]> .... </SEARCH> |
Listing 1. Typical layout for a SEARCH block in a search plug-in file. |
Search blocks begin with the <SEARCH ....> tag (containing a number of attributes, as described in Table 1) and end with a </SEARCH> tag. Within a typical search block describing an Internet search site, there will be one or more INPUT tags and an INTERPRET tag. The SEARCH block attributes describe the search site, how it is to be accessed, and where updates to the search plug-in file can be found. |
Table 1. SEARCH block attributes.
|
|
The
Here, &sv=AP will be sent to the server when the Sherlock application will be used to display the results, and &sv=IS will be sent to the server when a web browser will be used to display the results. The |
Table 2. INTERPRET tag attributes.
|
*An internet search source plug-in can specify |
The attributes It is possible, though, that the Sherlock application
will not be able to recognize a text encoding by name. For
these cases, search plug-in creators can explicitly specify
the character encoding that will be used in responses to
queries by using the For example, if a result page returned from a search site
was encoded using the “euc-jp” character set (in
With Sherlock 2, a plug-in can support multiple |
An ExampleIn this example, it is assumed that the Internet search site that we are writing the search plug-in file for is located at the URL <http://clarus.apple.com>. (As of this writing, this site does not exist, although the following text is written as if the site does exist. If the site did exist, it would presumably enable visitors to search for information regarding Clarus the Dogcow. An explanation of how visitors other than dogcattle would make use of the search results is beyond the scope of this document and is left as an exercise for the reader.) |
Step 1: Describe the site in the opening Using your web browser, go to the search site and view
the HTML source for the web page. Somewhere in the HTML, you
should find a
Or, it is possible that the action may be specified as a local string as follows:
If the action is specified as a local string, then prefix
it with the address in the
From the HTML source, we were able to determine that the
action is |
Step 2: Define the INPUT tags. There are two ways to determine what inputs are expected by an Internet search site. The first method is to manually perform a query and look at the URL that is sent to the server. The second is to pick through the HTML to discover the information. The Query Method. Looking at the query information is the simplest method. For example, if we go to the search site in our web browser and type the query string “coffee” and start a search, then we may observe a URL that looks like this:
From which, we can locate the inputs. The inputs come after the “?” and are separated by ampersand characters [&]. In this query, the inputs are as follows:
Using this information, we can construct the following two INPUT tags:
There may be some optional parameters available on a search site, so trying different options and queries may yield more useful information. The HTML Method. If the inputs are not present in
the URL then they must be determined by looking at the HTML
source. Here, we look for the
Between the
Again, this information can be used to construct the
following two
Experimenting with these input parameters and writing
different types of query URLs can provide useful information
about their meaning and use. For instance, after writing
several variations of the query URL, we discovered that
Now that the inputs have been determined, there is enough information to put together a complete search plug-in file:
However, in this form, although it will be possible for
queries to be sent and results to be displayed, the lack of
an |
Step 3: Describe the results in the Determining the text delimiters located in the responses returned by Internet search engines requires examination of the HTML source returned as the response to one or more queries. From this data, we can determine text patterns delimiting interesting parts of the response information. For example, suppose the following were returned as a response to a query: |
<HTML><HEAD><TITLE>Sample Results</TITLE></HEAD><BODY> <A HREF="http://www.apple.com"><IMG SRC="http://www.apple.com/main/elements/apple.gif" ALT="Apple Computer" </A> <P><SMALL>90%</SMALL><A HREF="http://www.apple.com/hotnews/">Hot News</A> Apple Hot News - http://www.apple.com/hotnews <BR><A HREF="http://www.apple.com">Apple Computer</A></P><P><SMALL>85%</SMALL><A HREF="http://www.apple.com/products/">Apple Products</A> Apple - Products - http://www.apple.com/products <BR><A HREF="http://www.apple.com">Apple Computer</A></P></BODY></HTML> |
Listing 2. A sample HTML response to a query. |
From this information, we can see that the banner section is delimited by the text patterns “<BODY>” and “<P>” as follows:
The List of results are delimited by the text patterns "</A>" and "</BODY>":
Each item in the list of results is bracketed by the text patterns "<P>" and "</P>":
And, the relevance score for each item is bracketed by the text patterns "<SMALL>" and "</SMALL>":
Putting this all together, the complete search plug-in file would have the following contents:
|
Internet Search and XML Search ResultsIt is possible that a search engine may provide a separate machine-readable interface such as Extensible Markup Language (XML). |
<searchResponse> <advertisement> <a href="http://www.advertiser.com"> <img src="ad.gif"> </a> </advertisement> <searchResults> <resultItem> <b><relevance>67%</relevance></b> <link><a href="http://www.foo.com">Title</a></link><br/> <summary>Summary</summary> </resultItem> </searchResults></searchResponse> |
Listing 3. A sample XML document. |
At the time of this document’s creation, the XML
specification is still under development; however, using the
current state of the standard, the Internet Search Interface
can be easily configured to interpret XML result lists. For
example, the
|
<HTML><HEAD><TITLE>Sample Results</TITLE></HEAD><BODY> <!-- BANNER START --><A HREF="http://www.apple.com"><IMG SRC="http://www.apple.com/main/elements/apple.gif" ALT="Apple Computer" </A><!-- BANNER END --> <!-- RESULT LIST START --> <!-- RESULT ITEM START --><P><SMALL><!-- RELEVANCE START -->90% <!-- RELEVANCE END --></SMALL><A HREF="http://www.apple.com/hotnews/">Hot News</A> Apple Hot News - http://www.apple.com/hotnews <BR><A HREF="http://www.apple.com">Apple Computer</A></P><!-- RESULT ITEM END --> <!-- RESULT ITEM START --><P><SMALL><!-- RELEVANCE START -->85% <!-- RELEVANCE END --></SMALL><A HREF="http://www.apple.com/products/">Apple Products</A> Apple - Products - http://www.apple.com/products <BR><A HREF="http://www.apple.com">Apple Computer</A></P><!-- RESULT ITEM END --> <!-- RESULT LIST END --> </BODY></HTML> |
Listing 4. A simple HTML response to a query that includes delimiting comments. |
Banner Advertisements The Sherlock application uses the first HTML anchor (that includes a hypertext jump and an image) found in the banner section as the banner image. For best results, banner advertisements should be enclosed in an HTML anchor that includes both an hypertext jump (HREF attribute) and an IMG tag that includes a SRC attribute and, preferably, an ALT attribute. For example, the HTML anchor shown below illustrates the suggested format for banner advertisements:
Result Lists When interpreting search results, the Sherlock
application identifies results by looking for HTML anchors
containing hypertext jump attributes. At least one anchor
including an hypertext jump (HREF attribute) should occur
between the text patterns specified in
|
Internet Search Interface Language BNFAll tags are case-insensitive and white space is ignored. <search-interface> ::= <search-start> <input-interp-list> <search-end> <search-start> ::= <left-angle-bracket> search <search-attr-list> <right-angle-bracket><search-attr-list> ::= <search-attribute> <search-attr-list> | <search-attribute> | <empty><search-end> ::= <left-angle-bracket> /search <right-angle-bracket><search-attribute> ::= <name> | <method> | <action> | <update> | <updateCheckDays> | <description> | <banner-link> | <banner-image> | <route-type><name> ::= name <attrib-assign><method> ::= method <attrib-assign><action> ::= action <attrib-assign><update> ::= update <attrib-assign><updateCheckDays> ::= updateCheckDays <attrib-assign><description> ::= description <attrib-assign><banner-link> ::= bannerlink <attrib-assign><banner-image> ::= bannerimage <attrib-assign><route-type> ::= routeType <white-space> = <white-space> <channel> <input-interp-list> ::= <iip-list-item> <input-interp-list> | <iip-list-item> <iip-list-item> ::= <interpret> | <input><input> ::= <left-angle-bracket> input <input-attr-list> <right-angle-bracket><input-attr-list> ::= <input-attribute> <input-attr-list> | <input-attribute> | <empty><input-attribute> ::= <name> | <value> | <user-select><value> ::= value <attrib-assign><user-select> ::= user <interpret> ::= <left-angle-bracket> interpret <interpret-attr-list> <right-angle-bracket><interpret-attr-list>::= <interpret-attribute> <interpret-attr-list> | <interpret-attribute> | <empty><interpret-attribute>::= <rl-start> | <rl-end> | <ri-start> | <ri-end> <banner-start> | <banner-end> | <rel-start> | <rel-end> | <skip-local> | <new-interpret-attr><rl-start> ::= resultListStart <attrib-assign><rl-end> ::= resultListEnd <attrib-assign><ri-start> ::= resultItemStart <attrib-assign><ri-end> ::= resultItemEnd <attrib-assign><banner-start> ::= bannerStart <attrib-assign><banner-end> ::= bannerEnd <attrib-assign><rel-start> ::= relevanceStart <attrib-assign><rel-end> ::= relevanceEnd <attrib-assign><skip-local> ::= skipLocal <new-interpret-attr> ::= <price-start> | <price-end> | <avail-start> | <avail-end> | <date-start> | <date-end> | <name-start> | <name-end><price-start> ::= priceStart <attrib-assign><price-end> ::= priceEnd <attrib-assign><avail-start> ::= availStart <attrib-assign><avail-end> ::= availEnd <attrib-assign><date-start> ::= dateStart <attrib-assign><date-end> ::= dateEnd <attrib-assign><name-start> ::= nameStart <attrib-assign><name-end> ::= nameEnd <attrib-assign> <chanel> ::= <attrib> | <predefined-chanel><predefined-chanel> ::= " <predef-chanel-name> " <predef-chanel-name> ::= internet | people | apple | reference | news | shopping <attrib-assign> ::= <white-space> = <white-space> <attrib><attrib> ::= <quotestr> | <doublequotestr> | <noquotestr><quotestr> ::= ' one-or-more-letters-not-including-a-single-quote ' <doublequotestr> ::= " one-or-more-letters-not-including-a-double-quote " <noquotestr> ::= one-or-more-letters-not-including-a-space-character <white-space> ::= <space-character> <white-space> | <space-character><space-character> ::= #0x20 | #0x09 | #0x0D | #0x0A <left-angle-bracket> ::= <<right-angle-bracket>::= ><empty> ::= |
AppleScript SupportThe new search facilities provided by the Sherlock
application can be accessed from AppleScript scripts.
AppleScript scripts can ask the Sherlock application to
perform an Internet search using one or more Internet Search
Sites or search for files with specific content on local or
remote volumes. Each of these commands returns the results
of the search as a string that can be used elsewhere in your
script. Optionally, AppleScript scripts can ask the Sherlock
application to display the results of the search. |
Searching the InternetInternet based searches use the “search Internet” command. The “search Internet” command allows AppleScript scripts to specify the Internet search sites that will be used in the search along with query information. The query information can be provided as either a string or as a reference to a file containing the query information (but not both). Results of the search are returned as a string, and it is possible to specify that the Sherlock application display the results. Definition 1 includes the “search Internet” entry from the Sherlock application’s AppleScript dictionary. |
|
Definition 1. The "search Internet" dictionary entry from the Sherlock application. |
It is important to remember that the “for” and “using” parameters are mutually exclusive and cannot be used together in one command. Either the query information is provided as a string or it is provided in a file. If the display parameter is true, then the Sherlock application will display the results of the search. The “using” parameter allows query information stored in a file to be used rather than a query string. To create such a file, use the “Save Search Criteria” command in the Sherlock application’s File menu. The direct object to this command is a list of Internet search site names. If the list of Internet search site names is not specified and the “for string” parameter is used, then the same sites that were used in the last Internet search will be used in the search. The list of Internet sites is ignored when the “using alias” parameter is specified. |
Selecting Search SitesSherlock provides a AppleScript command allowing you to select the search sites that will be used in the next Internet search. With Sherlock 2, an additional parameter has been added to the select search sites command allowing you to select a set of search sites that will be used within a particular channel. |
|
Definition 2. The "search" dictionary entry from the Sherlock application. |
Searching FilesTwo AppleScript commands are provided for access to the Find by Content facilities in the Sherlock application. The first command allows AppleScript scripts to perform searches based on contents of files and the second allows AppleScript scripts to create or update index files on particular volumes that are used by Find By Content. The AppleScript dictionary entry for the “search” command is shown in Definition 2 and the “index volumes” command is shown in Definition 3. The “search” command allows AppleScript scripts to perform searches based on file contents. |
|
Definition 3. The "search" dictionary entry from the Sherlock application. |
In the “search” command, the parameters “for,” “similar to,” and “using” are mutually exclusive parameters and may not be used together in the same command. As in the Internet search command, the “using” parameter allows query information stored in a file to be used rather than a query string. To create such a file, use the “Save Search Criteria” command in the Sherlock application’s File menu. The direct object to the “search” command is a list of volumes or folders to search. If no list of volumes is provided and either the “search for” or the “search similar to” parameter is used, then the “search” command will search all local, indexed volumes. When the “using” parameter is specified, the list of volumes is ignored. |
Indexing VolumesBefore the Find By Content facilities can be used to search a volume, the volume must contain an index. Index files are stored in an invisible folder called “TheFindByContentFolder” located in a volume’s root directory and they contain necessary information for performing content-based searches. A volume cannot be searched by the Find By Content facilities unless it contains an index. AppleScript scripts can ask the Sherlock application to either update or create an index file for one or more volumes. |
|
Definition 4. The "index volumes" dictionary entry from the Sherlock application. |
Indexing ContainersSherlock 2 adds a new AppleScript feature allowing callers to re-index particular folders or files without having to index an entire volume. This feature is not available with the original version of Sherlock. Scripts attempting to use this feature with older versions of Sherlock will fail. |
|
Definition 5. The “index containers” dictionary entry from the Sherlock 2 application. |
Search ChannelsSherlock 2 adds the concept of search channels. To allow script writers full access to this new facility, a new “channel” class has been added to Sherlock’s AppleScript suite. Scripts can use this new class to find out what channels are available, get and set the current channel, and refer to channels in search commands. Here are some examples of commands that can be used with channels: count channels exists channel "Internet" get channels get name of channels get all search sites of channel "Internet" get current channel set current channel to channel "Internet" |
The Optional kAEOpenDocuments Apple Event ParameterTo provide applications with information useful in
selecting and displaying parts of documents in which users are
most likely interested, when the user opens a file that was
located by way of a content-based search from within one of
the Sherlock application’s windows, the Sherlock application
will insert information about the search that led to the
file into the This type of |
OSErr GetSearchWordsFromAppleEvent(AppleEvent* inAppleEvent, char* theText, long *ioLength) { OSErr err; DescType outType; AERecord propData = {typeNull, NULL}; /* set up our variables */ if (ioLength == NULL || theText == NULL) return paramErr; /* get the property data from the Apple event */ err = AEGetParamDesc(inAppleEvent, keyAEPropData, typeAERecord, &propData); /* extract the search words information */ if (err == noErr) err = AEGetKeyPtr(&propData, ’srwd', typeChar, &outType, theText, *ioLength, ioLength); /* clean up and return */ AEDisposeDesc(&propData); return err; } |
Listing 5. Retrieving the search words from and 'odoc' Apple event.
|
The Example shown in Listing 5 illustrates how an
application may extract the query information from an
Note: It is possible for The presence of this additional parameter will not affect
the behavior of existing applications built according to the
guidelines set forth in the “Responding to Apple Events”
chapter of Inside Macintosh: Interapplication
Communication. However, developers may choose to take
advantage of this new information when it is present in an
Apple event as a clue pointing to the part of the document
that the user would like to see first. (The presence of the
In some cases, however, it is possible that some or all
of the words in the query string may not appear in the
document being opened. In a normal search based on a query
phrase, Find By Content will locate files that contain one
or more of the words in the query. But, when a user selects
one or more documents found in a previous search and
requests “similar” documents, then it is possible that some
of the documents found may not contain any of the words from
the query string specified in the original search.
Developers accessing the |
|
AcknowledgmentsThanks to David Casseres, Pete Gontier, Tim Holmes, Ingrid Kelly, Michael J. Kobb, Eric Koebler, Alice Li, and Wayne Loofbourrow. |